Our goal here to explore the relationship between the variables of the “mtcars” dataset. Our research will focus on exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). We are particularly interested in the following two questions:
require(datasets)
require(ggplot2)
## Loading required package: ggplot2
require(GGally)
## Loading required package: GGally
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
require(ggthemes)
## Loading required package: ggthemes
require(plotly)
## Loading required package: plotly
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
require(dplyr)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
require(ggcorrplot)
## Loading required package: ggcorrplot
## Warning: package 'ggcorrplot' was built under R version 4.0.5
require(Amelia)
## Loading required package: Amelia
## Warning: package 'Amelia' was built under R version 4.0.5
## Loading required package: Rcpp
## ##
## ## Amelia II: Multiple Imputation
## ## (Version 1.7.6, built: 2019-11-24)
## ## Copyright (C) 2005-2021 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##
require(leaps)
## Loading required package: leaps
cars <- mtcars
head(cars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
missmap(cars) # to check the missing values
There is no missing values in the dataset but, we need to change some of the varible types.
cars$cyl <- as.factor(cars$cyl)
cars$vs <- as.factor(cars$vs)
cars$am <- as.factor(cars$am)
str(cars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
## $ am : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
# To check the correlation b/w the variables
plot1 <- ggplotly(ggcorr(mtcars, nbreaks = 4, palette = "RdGy"))
plot1
plot2 <- ggplot(cars, aes(mpg)) +
geom_histogram(aes(fill=am),binwidth = 5) +facet_grid(.~am)+
labs(title = "Histogram of Miles per Gallon",
x='"Transmission Type: 0 = Automatic ; 1
= Manual"', y = "Frequency")
ggplotly(plot2)
plot3 <- ggplot(data = cars, aes(x = am, y = mpg)) +
geom_boxplot(aes(fill=factor(cyl))) +
labs(x='"Transmission Type: 0 = Automatic ; 1
= Manual"',y="Miles per Gallon",
title='MPG by Transmission Type')
ggplotly(plot3)
aggregate(mpg ~ am, data = mtcars, mean)
## am mpg
## 1 0 17.14737
## 2 1 24.39231
Correlation Analysis - As we can see in the correlation graph, that “Miles per Gallon” have - correlation with cyl, hp, and wt. That means car mileage will decrease as car weight and horsepower increase.
Histogram Analysis - Manual transmission cars have higher MPG than automatic transmission.
Boxplot Analysis - Manual transmission is 7.25 MPG higher than automatic transmission.
Here I will use the Best Subset Regression method to check which variable affects the mpg more. To run the regression with the best subset selection, we will use the regsubsets function from the package leaps
fit <- regsubsets(mpg~., cars, nvmax = 11)
# nvmax is the maximum number of predictors to be included in the model by default it is set to 8
sum <- summary(fit)
#to assess the goodness-of-fit we will use the adjusted R square
# because it is automatically computed for each model
sum$adjr2
## [1] 0.7445939 0.8148396 0.8335561 0.8413067 0.8478136 0.8440579 0.8412051
## [8] 0.8378688 0.8319505 0.8252176 0.8164807
which.max(sum$adjr2)# find the model with maximum adjusted R squared
## [1] 5
coef(fit, 5) # To print the coefficients of this model
## (Intercept) cyl6 hp wt vs1 am1
## 31.28240981 -2.20519611 -0.03393442 -2.36781111 1.87741318 2.62111773
model <- lm(mpg~ cyl+hp+wt+vs+am, data=cars)# Now applying the regression model on best fit coefficients
summary(model)
##
## Call:
## lm(formula = mpg ~ cyl + hp + wt + vs + am, data = cars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3405 -1.2158 0.0046 0.9389 4.6354
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 31.18461 3.42002 9.118 2e-09 ***
## cyl6 -2.09011 1.62868 -1.283 0.2112
## cyl8 0.29098 3.14270 0.093 0.9270
## hp -0.03475 0.01382 -2.515 0.0187 *
## wt -2.37337 0.88763 -2.674 0.0130 *
## vs1 1.99000 1.76018 1.131 0.2690
## am1 2.70384 1.59850 1.691 0.1032
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.397 on 25 degrees of freedom
## Multiple R-squared: 0.8724, Adjusted R-squared: 0.8418
## F-statistic: 28.49 on 6 and 25 DF, p-value: 5.064e-10
The results suggest that the best model includes cyl6, cyl8, hp, wt, and manual variables. This model explains about 86.59% of the variance. Cylinders change negatively with mpg (-3.03miles and -2.16miles for cyl6 and cyl8, respectively), so do with horsepower (-0.03miles) and weight (-2.5miles for every 1,000lb). On the other hand, the manual transmission is 1.81mpg better than the automatic transmission.
On average, the manual transmission is better than the automatic transmission by 1.81mpg. However, transmission type is not the only factor accounting for MPG; cylinders, horsepower, and weight are the important factors affecting the MPG.